Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
The availability of large datasets of organism images combined with advances in artificial intelligence (AI) has significantly enhanced the study of organisms through images, unveiling biodiversity patterns and macro-evolutionary trends. However, existing machine learning (ML)-ready organism datasets have several limitations. First, these datasets often focus on species classification only, overlooking tasks involving visual traits of organisms. Second, they lack detailed visual trait annotations, like pixel-level segmentation, that are crucial for in-depth biological studies. Third, these datasets predominantly feature organisms in their natural habitats, posing challenges for aquatic species like fish, where underwater images often suffer from poor visual clarity, obscuring critical biological traits. This gap hampers the study of aquatic biodiversity patterns which is necessary for the assessment of climate change impacts, and evolutionary research on aquatic species morphology. To address this, we introduce the Fish-Visual Trait Analysis (Fish-Vista) dataset—a large, annotated collection of about 80K fish images spanning 3000 different species, supporting several challenging and biologically relevant tasks including species classification, trait identification, and trait segmentation. These images have been curated through a sophisticated data processing pipeline applied to a cumulative set of images obtained from various museum collections. Fish-Vista ensures that visual traits of images are clearly visible, and provides fine-grained labels of various visual traits present in each image. It also offers pixel-level annotations of 9 different traits for about 7000 fish images, facilitating additional trait segmentation and localization tasks. The ultimate goal of Fish-Vista is to provide a clean, carefully curated, high-resolution dataset that can serve as a foundation for accelerating biological discoveries using advances in AI. Finally, we provide a comprehensive analysis of state-of-the-art deep learning techniques on Fish-Vista.more » « lessFree, publicly-accessible full text available June 15, 2026
-
Abstract Background Identification of genes responsible for anatomical entities is a major requirement in many fields including developmental biology, medicine, and agriculture. Current wet lab techniques used for this purpose, such as gene knockout, are high in resource and time consumption. Protein–protein interaction (PPI) networks are frequently used to predict disease genes for humans and gene candidates for molecular functions, but they are rarely used to predict genes for anatomical entities. Moreover, PPI networks suffer from network quality issues, which can be a limitation for their usage in predicting candidate genes. Therefore, we developed an integrative framework to improve the candidate gene prediction accuracy for anatomical entities by combining existing experimental knowledge about gene-anatomical entity relationships with PPI networks using anatomy ontology annotations. We hypothesized that this integration improves the quality of the PPI networks by reducing the number of false positive and false negative interactions and is better optimized to predict candidate genes for anatomical entities. We used existing Uberon anatomical entity annotations for zebrafish and mouse genes to construct gene networks by calculating semantic similarity between the genes. These anatomy-based gene networks were semantic networks, as they were constructed based on the anatomy ontology annotations that were obtained from the experimental data in the literature. We integrated these anatomy-based gene networks with mouse and zebrafish PPI networks retrieved from the STRING database and compared the performance of their network-based candidate gene predictions. Results According to evaluations of candidate gene prediction performance tested under four different semantic similarity calculation methods (Lin, Resnik, Schlicker, and Wang), the integrated networks, which were semantically improved PPI networks, showed better performances by having higher area under the curve values for receiver operating characteristic and precision-recall curves than PPI networks for both zebrafish and mouse. Conclusion Integration of existing experimental knowledge about gene-anatomical entity relationships with PPI networks via anatomy ontology improved the candidate gene prediction accuracy and optimized them for predicting candidate genes for anatomical entities.more » « less
-
Abstract There is a growing body of research on the evolution of anatomy in a wide variety of organisms. Discoveries in this field could be greatly accelerated by computational methods and resources that enable these findings to be compared across different studies and different organisms and linked with the genes responsible for anatomical modifications. Homology is a key concept in comparative anatomy; two important types are historical homology (the similarity of organisms due to common ancestry) and serial homology (the similarity of repeated structures within an organism). We explored how to most effectively represent historical and serial homology across anatomical structures to facilitate computational reasoning. We assembled a collection of homology assertions from the literature with a set of taxon phenotypes for the skeletal elements of vertebrate fins and limbs from the Phenoscape Knowledgebase. Using seven competency questions, we evaluated the reasoning ramifications of two logical models: the Reciprocal Existential Axioms (REA) homology model and the Ancestral Value Axioms (AVA) homology model. The AVA model returned all user-expected results in addition to the search term and any of its subclasses. The AVA model also returns any superclass of the query term in which a homology relationship has been asserted. The REA model returned the user-expected results for five out of seven queries. We identify some challenges of implementing complete homology queries due to limitations of OWL reasoning. This work lays the foundation for homology reasoning to be incorporated into other ontology-based tools, such as those that enable synthetic supermatrix construction and candidate gene discovery. [Homology; ontology; anatomy; morphology; evolution; knowledgebase; phenoscape.]more » « less
-
Abstract Omic BON is a thematic Biodiversity Observation Network under the Group on Earth Observations Biodiversity Observation Network (GEO BON), focused on coordinating the observation of biomolecules in organisms and the environment. Our founding partners include representatives from national, regional, and global observing systems; standards organizations; and data and sample management infrastructures. By coordinating observing strategies, methods, and data flows, Omic BON will facilitate the co-creation of a global omics meta-observatory to generate actionable knowledge. Here, we present key elements of Omic BON's founding charter and first activities.more » « less
An official website of the United States government
